Thanks to huge efforts by citizen scientists, a majority of species observations in the Atlas of Living Australia are collected opportunistically, where people record observations incidentally rather than through a recurring monitoring program.
However, these observations are susceptible to things unrelated to the species itself. It might be rainy, it might be too hot, an area might be inaccessible; all of these factors can affect whether people record an observation or not.
The COVID-19 pandemic had a major impact on people’s health, behaviour and travel. In Australia, several shorter lockdowns over 2020-2021 imposed restrictions on people’s movements. Melbourne experienced the longest continuous lockdown in the world. These restrictions limited people to either stay at home or choose activities (mainly exercise) that they could do in their local area. To what extent did COVID-19 affect the number of species observations people made over that time?
Here, we’ll use a highlighted time-series plot to investigate how lockdowns affected the data collection of Anatidae observations (ducks, geese and swans), a group frequently recorded on walks and outdoor gatherings, in Melbourne compared to previous years.
Get data
We’ll start by downloading Anatidae records.
Left: Tadorna (Casarca) tadornoides (Tracey Hinton CC-BY-NC 4.0 (Int)) Middle: Cygnus (Chenopis) atratus (jpshahady CC-BY-NC 4.0 (Int)) Right: Spatula rhynchotis (Annette Green CC-BY-NC 4.0 (Int))
First, let’s load some packages:
# Load packages
library(galah)
library(tidyverse)
library(lubridate)
library(grid)
library(ggplot2)
library(pilot) # remotes::install_github("olihawkins/pilot")
library(showtext)Let’s use the {galah} package to download duck records in Melbourne from 2017-2021 to get records from years just before and during COVID-19 lockdowns.
Searching with galah::search_fields() shows us that {galah} contains Greater Capital City Statistical Areas, which we can use to filter our query.
search_all(fields, "city")# A tibble: 1 × 4
id description type link
<chr> <chr> <chr> <chr>
1 cl10929 PSMA ABS Greater Capital City Statistical Areas (2016) AB… laye… http…
search_all(fields, "city") |> search_values("melbourne")• Showing values for 'cl10929'.
# A tibble: 1 × 2
field category
<chr> <chr>
1 cl10929 GREATER MELBOURNE
Let’s build our query to return Anatidae records from GREATER MELBOURNE from 2017 to 2021. We’ll use galah_select() to return only the eventDate column.
You will need to first provide a registered email with the ALA using galah_config() before retrieving records.
# Add registered email (register at ala.org.au)
galah_config(email = "your-email@email.com")ducks <-
galah_call() |>
galah_identify("Anatidae") |>
galah_filter(
cl10929 == "GREATER MELBOURNE",
# year >= 2017,
# year <= 2021,
eventDate >= "2017-01-01T00:00:00Z",
eventDate <= "2021-12-31T23:59:00Z",
basisOfRecord == "HUMAN_OBSERVATION"
) |>
galah_select(eventDate) |>
atlas_occurrences()
ducks |> head(6L)# A tibble: 6 × 1
eventDate
<dttm>
1 2017-01-01 13:00:00
2 2017-01-01 13:00:00
3 2017-01-01 13:00:00
4 2017-01-01 13:00:00
5 2017-01-01 13:00:00
6 2017-01-01 13:00:00
We’ll pull out some week and year of each date and count the total occurrence records for each week.
ducks_weekly <- ducks |>
mutate(date = as_date(eventDate),
year = year(eventDate),
week = week(eventDate)) |>
group_by(year, week) |>
summarise(week_count = n())`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
ducks_weekly # A tibble: 265 × 3
# Groups: year [5]
year week week_count
<dbl> <dbl> <int>
1 2017 1 647
2 2017 2 670
3 2017 3 665
4 2017 4 790
5 2017 5 745
6 2017 6 580
7 2017 7 658
8 2017 8 680
9 2017 9 575
10 2017 10 541
# … with 255 more rows
We are intending to compare record counts in 2020-2021 to previous years. However, because record numbers added to the ALA have grown each year, it’s likely that comparing raw numbers will show a biased view that there were more records than previous years.
To avoid this, let’s scale our record counts by the total number of Anatidae records each year so that we compare proportions rather than raw numbers.
First let’s download the total Anatidae records for each year.
ducks_yearly <-
galah_call() |>
galah_identify("Anatidae") |>
galah_filter(cl10929 == "GREATER MELBOURNE",
year >= 2017, year <= 2021) |>
galah_group_by(year) |>
atlas_counts() |>
rename(year_total = count) |>
mutate(year = as.numeric(year)) |>
arrange(-desc(year))
ducks_yearly# A tibble: 5 × 2
year year_total
<dbl> <int>
1 2017 37194
2 2018 48696
3 2019 47874
4 2020 48850
5 2021 57520
Now we’ll split ducks_weekly into 5 data.frames for each year and divide the record count by the total for each year. At the end we’ll bind the separate data.frames into one again.
ducks_scaled <- ducks_weekly %>%
split(.$year) %>%
map2(.x = .,
.y = ducks_yearly$year_total,
~ .x %>%
mutate(prop = .x$week_count / .y * 100) %>%
select(-week_count)
) %>%
bind_rows()
ducks_scaled# A tibble: 265 × 3
# Groups: year [5]
year week prop
<dbl> <dbl> <dbl>
1 2017 1 1.74
2 2017 2 1.80
3 2017 3 1.79
4 2017 4 2.12
5 2017 5 2.00
6 2017 6 1.56
7 2017 7 1.77
8 2017 8 1.83
9 2017 9 1.55
10 2017 10 1.45
# … with 255 more rows
Finally, we’ll place our weekly proportions in separate columns with pivot_wider().
In our final plot we want to compare the “normal” number of records collected to those in 2020 and 2021, and it would be nice to represent the “normal” number as one line representing the average in “normal” years. With that goal in mind, we’ll also calculate the mean proportion of records each week from 2017-2019.
ducks_scaled <- ducks_scaled %>%
pivot_wider(names_from = year,
values_from = prop,
names_sort = TRUE,
names_glue = "year_{year}") |>
rowwise() |>
mutate(mean_2017_19 = mean(c_across(year_2017:year_2019)))
ducks_scaled# A tibble: 53 × 7
# Rowwise:
week year_2017 year_2018 year_2019 year_2020 year_2021 mean_2017_19
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1.74 1.74 3.22 2.83 2.18 2.23
2 2 1.80 1.89 2.95 1.91 1.86 2.21
3 3 1.79 1.56 2.65 2.51 2.02 2.00
4 4 2.12 1.53 2.30 2.43 1.75 1.99
5 5 2.00 1.62 1.85 1.59 2.03 1.83
6 6 1.56 1.34 1.78 1.75 2.05 1.56
7 7 1.77 1.77 1.87 1.47 1.57 1.80
8 8 1.83 1.37 1.98 1.54 1.90 1.73
9 9 1.55 1.45 1.33 1.28 1.84 1.44
10 10 1.45 1.25 2.12 1.17 1.67 1.61
# … with 43 more rows
Now we have our dataset, it needs a bit of reorganising to make it suitable for plotting. Counts are grouped by week of the year, meaning we have 2 sets of weeks 1-52 (one for 2020, one for 2021). To plot these in order along an axis we need to convert this to 1-53 for 2020 (a leap year, extra week), then 54-106 for 2021.
# 2021 record count proportions
ducks_scaled_2021 <- ducks_scaled |>
select(week,
prop = year_2021,
mean_2017_19) |>
mutate(week = 53 + week)
ducks_scaled_2021 |> head(5L)# A tibble: 5 × 3
# Rowwise:
week prop mean_2017_19
<dbl> <dbl> <dbl>
1 54 2.18 2.23
2 55 1.86 2.21
3 56 2.02 2.00
4 57 1.75 1.99
5 58 2.03 1.83
# 2020 + 2021 record count proportions
ducks_final <- ducks_scaled |>
select(week,
prop = year_2020,
mean_2017_19) |>
bind_rows(ducks_scaled_2021)
ducks_final# A tibble: 106 × 3
# Rowwise:
week prop mean_2017_19
<dbl> <dbl> <dbl>
1 1 2.83 2.23
2 2 1.91 2.21
3 3 2.51 2.00
4 4 2.43 1.99
5 5 1.59 1.83
6 6 1.75 1.56
7 7 1.47 1.80
8 8 1.54 1.73
9 9 1.28 1.44
10 10 1.17 1.61
# … with 96 more rows
Lockdowns
During the height of the pandemic, Melbourne had 6 distinct lockdowns. Let’s add their start and end dates to a table.
n_lockdown <- c(1:6)
start_date <- c("2020-03-31", "2020-07-09",
"2021-02-13", "2021-05-28",
"2021-07-16", "2021-08-05")
end_date <- c("2020-05-12", "2020-10-27",
"2021-02-17", "2021-06-10",
"2021-07-27", "2021-10-21")
lockdowns <- tibble(n_lockdown, start_date, end_date) |>
mutate(
n_days = as_date(ymd(end_date)) - as_date(ymd(start_date)),
weekstart = week(start_date),
weekend = week(end_date))
lockdowns # A tibble: 6 × 6
n_lockdown start_date end_date n_days weekstart weekend
<int> <chr> <chr> <drtn> <dbl> <dbl>
1 1 2020-03-31 2020-05-12 42 days 13 19
2 2 2020-07-09 2020-10-27 110 days 28 43
3 3 2021-02-13 2021-02-17 4 days 7 7
4 4 2021-05-28 2021-06-10 13 days 22 23
5 5 2021-07-16 2021-07-27 11 days 29 30
6 6 2021-08-05 2021-10-21 77 days 31 42
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.2.2 (2022-10-31 ucrt)
os Windows 10 x64 (build 19044)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2023-03-10
pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.0 2023-01-29 [1] CRAN (R 4.2.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.2)
galah * 1.5.1 2023-02-21 [1] Github (AtlasOfLivingAustralia/galah@bd43dd2)
ggplot2 * 3.4.1 2023-02-10 [1] CRAN (R 4.2.2)
htmltools * 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.2.2)
pilot * 4.0.0 2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.2.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
showtext * 0.9-5 2022-02-09 [1] CRAN (R 4.2.1)
showtextdb * 3.0 2020-06-04 [1] CRAN (R 4.2.1)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
sysfonts * 0.8.8 2022-03-13 [1] CRAN (R 4.2.1)
tibble * 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.2)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.2)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.2.2/library
──────────────────────────────────────────────────────────────────────────────